Take-home Exercise 2

Published

March 3, 2024

1. Overview

Dengue Fever, or Dengue Hemorrhagic Fever, is a prevalent mosquito-borne illness found in tropical and subtropical regions. It stems from acute dengue virus infection transmitted by female Aedes aegypti and Aedes albopictus mosquitoes. In 2015, Taiwan faced a severe outbreak, recording over 43,000 cases and 228 deaths. Subsequently, annual cases remained below 200. However, in 2023, Taiwan saw a spike with 26,703 reported cases, notably over 25,000 in Tainan City.

2. The Task

Our task is to explore whether the occurrence of dengue fever outbreaks in Tainan City is influenced by spatial factors alone or both spatial and temporal factors. If the outbreaks show dependence on space or both space and time, the goal is to identify clusters, outliers, and emerging hotspots/coldspots within the area.

3. The Data

This study will utilize two datasets, which are:

  • Geospatial data of village boundary of Taiwan in the format of an ESRI shapefile. The data is in the Taiwan Geographic Coordinate System.

  • Aspatial data of reported dengue cases in Taiwan since 1998.

4. Installation and Loading of Packages

The R packages that will be used in this study are:

  • sf: for importing, managing, and processing geospatial data

  • tidyverse: for performing data science tasks

  • tmap: for plotting cartographic quality static point patterns maps or interactive maps by using leaflet API

  • sfdep: supports spatial dependence analysis in spatial econometrics using R.

  • lubridate: provides functions for working with dates and times

  • plotly: for interactive and dynamic plotting in R

  • dplyr: for wrangling data

pacman::p_load(sf, tidyverse, tmap, sfdep, lubridate, plotly, dplyr)

5. Import Data

5.1 Importing geospatial data

The st_read() function is associated with the sf package in R. This function is used to read spatial data from various formats. When using the st_read function to read a shapefile, there is no need to explicitly specify the file format.

tainan_sf <- st_read(dsn = "../../data/th2/data/geospatial", 
                layer = "TAINAN_VILLAGE")
Reading layer `TAINAN_VILLAGE' from data source 
  `C:\viddyasri\IS415-GAA\data\th2\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 649 features and 10 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS:  TWD97

Now, to understand the shapefile imported, let’s retrieve information related to the data frame.

st_geometry(tainan_sf)
Geometry set for 649 features 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS:  TWD97
First 5 geometries:
head(tainan_sf, n=10)
Simple feature collection with 10 features and 10 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 120.0269 ymin: 22.93251 xmax: 120.2905 ymax: 23.17218
Geodetic CRS:  TWD97
      VILLCODE COUNTYNAME TOWNNAME VILLNAME        VILLENG COUNTYID COUNTYCODE
1  67000280002     臺南市   歸仁區   六甲里    Liujia Vil.        D      67000
2  67000350032     臺南市   安南區   青草里   Qingcao Vil.        D      67000
3  67000150009     臺南市   七股區   溪南里     Xinan Vil.        D      67000
4  67000150010     臺南市   七股區   七股里      Qigu Vil.        D      67000
5  67000150008     臺南市   七股區   龍山里  Longshan Vil.        D      67000
6  67000150017     臺南市   七股區   中寮里 Zhongliao Vil.        D      67000
7  67000150004     臺南市   七股區   篤加里     Dujia Vil.        D      67000
8  67000150007     臺南市   七股區   塩埕里 Yancheng  Vil.        D      67000
9  67000150022     臺南市   七股區   三股里     Sangu Vil.        D      67000
10 67000150023     臺南市   七股區   十份里    Shifen Vil.        D      67000
   TOWNID TOWNCODE NOTE                       geometry
1     D33 67000280 <NA> POLYGON ((120.2725 22.95868...
2     D06 67000350 <NA> POLYGON ((120.1176 23.08387...
3     D22 67000150 <NA> POLYGON ((120.121 23.1355, ...
4     D22 67000150 <NA> POLYGON ((120.1312 23.1371,...
5     D22 67000150 <NA> POLYGON ((120.0845 23.13503...
6     D22 67000150 <NA> POLYGON ((120.126 23.16917,...
7     D22 67000150 <NA> POLYGON ((120.1585 23.15376...
8     D22 67000150 <NA> POLYGON ((120.0636 23.17069...
9     D22 67000150 <NA> POLYGON ((120.0422 23.11545...
10    D22 67000150 <NA> POLYGON ((120.1201 23.09364...

We will use plot to create basic initial visualizations of the geospatial data and use tmap for a more detailed distribution of Tainan village and its counties.

plot(tainan_sf)

tmap_mode('plot')
tm_shape(tainan_sf) +
  tm_polygons("TOWNID")

Important

The number of levels of the variable “TOWNID” is 37, which is larger than the maximum number of categories (which is 30), thus the levels are combined.

5.2 Importing aspatial data

The read_csv() function is associated with the tidyverse package in R. This function is employed to read tabular data in CSV (Comma-Separated Values) format. When using the read_csv function, there is a need to explicitly specify the file format.

dengue_csv <- read_csv("../../data/th2/data/aspatial/Dengue_Daily.csv")

For the aspatial data, only three columns are of relevance to this exercise. The three columns are

  • 發病日: Onset date
  • 最小統計區中心點X: x-coordinate
  • 最小統計區中心點Y: y-coordinate

Therefore, we will be extracting only those three columns and renaming them for ease of further evaluation.

dengue_csv <- dengue_csv %>%
  select(發病日, 最小統計區中心點X, 最小統計區中心點Y) %>%
  rename(ONSET_DATE = "發病日", X_COORDINATE = "最小統計區中心點X", Y_COORDINATE = "最小統計區中心點Y")

Now, to understand the confined csv created, let’s retrieve information related to the it.

head(dengue_csv, n=10)
# A tibble: 10 × 3
   ONSET_DATE X_COORDINATE  Y_COORDINATE
   <date>     <chr>         <chr>       
 1 1998-01-02 120.505898941 22.464206650
 2 1998-01-03 120.453657460 22.466338948
 3 1998-01-13 121.751433765 24.749214667
 4 1998-01-15 120.338158907 22.630316700
 5 1998-01-20 121.798235373 24.684507639
 6 1998-01-22 None          None        
 7 1998-01-23 121.547480075 24.982467229
 8 1998-01-26 121.500936346 25.139302723
 9 1998-02-11 120.209079313 22.978859098
10 1998-02-16 120.313207729 22.724216594

From the above, we can see that some rows are missing values. We will look further into this matter in the data preparation section.

6. Data Preparation - Study Area Layer

In this section, we need to prepare a study area layer consisting of sf polygon features at the village level that include specific counties, namely D01, D02, D04, D06, D07, D08, D32, and D39 of Tainan City, Taiwan.

6.1 Handle invalid geometries

First, we check for invalid geometries and handle them accordingly.

length(which(st_is_valid(tainan_sf) == FALSE))
[1] 0

As shown above, there are no invalid geometries.

6.2 Check projection system

In this exercise, we aim to utilize Taiwan’s national projected coordinate system, commonly known as TWD97. Here, we verify whether our data frame is aligned with this specific projected coordinate system.

st_crs(tainan_sf)
Coordinate Reference System:
  User input: TWD97 
  wkt:
GEOGCRS["TWD97",
    DATUM["Taiwan Datum 1997",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["Taiwan, Republic of China - onshore and offshore - Taiwan Island, Penghu (Pescadores) Islands."],
        BBOX[17.36,114.32,26.96,123.61]],
    ID["EPSG",3824]]

Upon inspecting the dataframe, we can determine that it is utilizing the TWD97 coordinate system with the EPSG code 3824. Given this alignment, there is no requirement for a projection transformation.

6.2 Preparing the study area layer

To confine the study area to specific counties, the following code accomplishes this:

We begin by identifying the town IDs we want to include in the filtering process.

town_ids_to_include <- c("D01", "D02", "D04", "D06", "D07", "D08", "D32", "D39")

Subsequently, we create a new data frame by extracting rows from the original data frame that meet the condition of having the town IDs specified for inclusion.

tainan_sf_confined <- tainan_sf %>%
  filter(TOWNID %in% town_ids_to_include)

Now, we plot to see if we’ve successfully confined the data frame to the specified counties and to visually represent the distribution of these counties on the map.

tmap_mode('plot')
tm_shape(tainan_sf_confined) +
  tm_polygons("TOWNID")

As seen in the tmap above, we have successfully confined the data frame to the specified counties.

7. Data Preparation - Dengue Fever Layer

In this section, we need to prepare an sf point layer representing dengue fever cases within the designated study area. The inclusion criteria for these cases entail confining them to epidemiology weeks 31-50 of the year 2023.

7.1 Filtering data

Given that our analysis is specifically focused on the year 2023, let us first filter the dataset to isolate information from that specific year. This is crucial, as the existing CSV file contains data spanning from 1998, which is beyond the scope of our current investigation.

First, we transform the “ONSET_DATE” column into a date object.

dengue_csv$ONSET_DATE <- as.Date(dengue_csv$ONSET_DATE)

Subsequently, we proceed to filter and extract the data entries corresponding to the year 2023.

dengue_csv_confined <- dengue_csv[dengue_csv$ONSET_DATE >= as.Date("2023-01-01") & dengue_csv$ONSET_DATE <= as.Date("2023-12-31"), ]

Following that, we proceed to extract the week information from the “ONSET_DATE” column, utilizing this information to selectively filter dengue cases specifically for epidemiology weeks 31 through 50.

dengue_csv_confined$EPIDEMIOLOGY_WEEK <- week(dengue_csv_confined$ONSET_DATE)
dengue_csv_confined <- dengue_csv_confined %>%
  filter(EPIDEMIOLOGY_WEEK >= 31 & EPIDEMIOLOGY_WEEK <= 50)

Let’s take a look at we have now:

head(dengue_csv_confined, n=10)
# A tibble: 10 × 4
   ONSET_DATE X_COORDINATE  Y_COORDINATE EPIDEMIOLOGY_WEEK
   <date>     <chr>         <chr>                    <dbl>
 1 2023-07-30 120.220182286 22.976075790                31
 2 2023-07-30 120.218036763 22.980068070                31
 3 2023-07-30 120.235449178 23.013736726                31
 4 2023-07-30 120.245579790 23.019322681                31
 5 2023-07-30 120.246524983 23.018206041                31
 6 2023-07-30 120.226597184 23.016947543                31
 7 2023-07-30 120.459990679 23.599175697                31
 8 2023-07-30 120.234949811 23.015998065                31
 9 2023-07-30 120.234949811 23.015998065                31
10 2023-07-30 120.247723767 23.011129228                31

7.2 Handle missing values

In our previous inspection of csv data, we noticed that certain rows within the dataset exhibited “None” values in the X_COORDINATE and Y_COORDINATE columns. Now, we aim to quantify the number of these rows with missing values specifically for the year 2023 data.

none_column_X <- dengue_csv_confined[dengue_csv_confined$X_COORDINATE == "None", ]
none_column_Y <- dengue_csv_confined[dengue_csv_confined$Y_COORDINATE == "None", ]

Below are the missing rows:

print(none_column_X)
# A tibble: 14 × 4
   ONSET_DATE X_COORDINATE Y_COORDINATE EPIDEMIOLOGY_WEEK
   <date>     <chr>        <chr>                    <dbl>
 1 2023-08-18 None         None                        33
 2 2023-08-24 None         None                        34
 3 2023-09-15 None         None                        37
 4 2023-09-15 None         None                        37
 5 2023-09-19 None         None                        38
 6 2023-09-23 None         None                        38
 7 2023-09-26 None         None                        39
 8 2023-09-28 None         None                        39
 9 2023-09-30 None         None                        39
10 2023-10-17 None         None                        42
11 2023-10-21 None         None                        42
12 2023-10-26 None         None                        43
13 2023-10-28 None         None                        43
14 2023-12-08 None         None                        49
print(none_column_Y)
# A tibble: 14 × 4
   ONSET_DATE X_COORDINATE Y_COORDINATE EPIDEMIOLOGY_WEEK
   <date>     <chr>        <chr>                    <dbl>
 1 2023-08-18 None         None                        33
 2 2023-08-24 None         None                        34
 3 2023-09-15 None         None                        37
 4 2023-09-15 None         None                        37
 5 2023-09-19 None         None                        38
 6 2023-09-23 None         None                        38
 7 2023-09-26 None         None                        39
 8 2023-09-28 None         None                        39
 9 2023-09-30 None         None                        39
10 2023-10-17 None         None                        42
11 2023-10-21 None         None                        42
12 2023-10-26 None         None                        43
13 2023-10-28 None         None                        43
14 2023-12-08 None         None                        49

Given the uncertainty about the specific locations of the points with missing latitude or longitude values, it is advisable not to arbitrarily input latitude or longitude, as this may lead to inaccurate representations. Considering that the missing values constitute only 14 out of 25,479 observations, removing these observations could be a reasonable approach to ensure the accuracy of our spatial analysis. The missing values constitute a small proportion of the total observations and the impact on the overall analysis can be considered negligible.

Here, we eliminate rows where the coordinates are marked as “None” and converting coordinates to a numeric type to ensure that they can be used for spatial operations and mapping:

dengue_csv_confined <- dengue_csv_confined %>%
  filter(X_COORDINATE != "None" & Y_COORDINATE != "None") %>%
  mutate(
    X_COORDINATE = as.numeric(X_COORDINATE),
    Y_COORDINATE = as.numeric(Y_COORDINATE)
  )

7.3 Converting data into a sf object

Now, we can proceed with the conversion of the cleaned data into an sf data frame as well as ensuring that it is projected to the right coordinate system:

dengue_sf <- st_as_sf(dengue_csv_confined, coords = c("X_COORDINATE", "Y_COORDINATE"), crs = 4326) %>% st_transform(crs = 3824)
st_geometry(dengue_sf)
Geometry set for 25461 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 118.3081 ymin: 21.98551 xmax: 121.8983 ymax: 25.20237
Geodetic CRS:  TWD97
First 5 geometries:
head(dengue_sf)
Simple feature collection with 6 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 120.218 ymin: 22.97608 xmax: 120.2465 ymax: 23.01932
Geodetic CRS:  TWD97
# A tibble: 6 × 3
  ONSET_DATE EPIDEMIOLOGY_WEEK            geometry
  <date>                 <dbl>         <POINT [°]>
1 2023-07-30                31 (120.2202 22.97608)
2 2023-07-30                31  (120.218 22.98007)
3 2023-07-30                31 (120.2354 23.01374)
4 2023-07-30                31 (120.2456 23.01932)
5 2023-07-30                31 (120.2465 23.01821)
6 2023-07-30                31 (120.2266 23.01695)

We have successfully transformed the csv data into an sf point data frame, using the TWD97 projected coordinate system. Let’s do some initial visualizations of these points using plot() and tmap.

plot(dengue_sf, main = "Dengue Fever Cases (Epi Weeks 31-50, 2023)")

tmap_mode("plot")
tm_shape(dengue_sf) +
  tm_dots()

Note

It is important to note that the points depicted in the diagrams above haven’t been constrained to the study area; instead, they encompass the entire Tainan City. Further steps are required to refine the visualization and focus exclusively on the specified study area.

7.4 Confining dengue case layer within the study area

The dengue fever layer now needs to be restricted to the previously defined study area. This can be achieved by employing the st_intersection function. The st_intersection function essentially determines the common spatial elements between two layers, in this case, the dengue fever layer and the predefined study area. As a result of this operation, the output will consist of spatial data that exclusively pertains to the overlapping regions between the two layers.

The code below has been commented out due to its extended execution time, and instead, the resultant dengue fever layer has been loaded into the environment for further use.

#dengue_2023_tainan <- st_intersection(tainan_sf_confined, dengue_sf)
#write_rds(dengue_2023_tainan, "../../data/rds/dengue_2023_tainan.rds")
dengue_2023_tainan <- read_rds("../../data/rds/dengue_2023_tainan.rds")

Here is a glimpse of the resulting dengue fever layer:

head(dengue_2023_tainan, n=10)
Simple feature collection with 10 features and 12 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 120.1982 ymin: 22.97283 xmax: 120.2669 ymax: 23.04125
Geodetic CRS:  TWD97
       VILLCODE COUNTYNAME TOWNNAME VILLNAME         VILLENG COUNTYID
211 67000310027     臺南市   永康區   龍潭里    Longtan Vil.        D
175 67000340044     臺南市     北區   長勝里 Zhangsheng Vil.        D
145 67000310036     臺南市   永康區   尚頂里  Shangding Vil.        D
169 67000310041     臺南市   永康區   龍埔里     Longpu Vil.        D
241 67000310031     臺南市   永康區   六合里      Liuhe Vil.        D
19  67000310001     臺南市   永康區   五王里     Wuwang Vil.        D
63  67000320008     臺南市     東區   東光里  Dongguang Vil.        D
32  67000320032     臺南市     東區   崇德里    Chongde Vil.        D
182 67000370042     臺南市   中西區 小西門里  Xiaoximen Vil.        D
239 67000310026     臺南市   永康區   勝利里    Shengli Vil.        D
    COUNTYCODE TOWNID TOWNCODE NOTE ONSET_DATE EPIDEMIOLOGY_WEEK
211      67000    D39 67000310 <NA> 2023-07-29                31
175      67000    D04 67000340 <NA> 2023-07-29                31
145      67000    D39 67000310 <NA> 2023-07-29                31
169      67000    D39 67000310 <NA> 2023-07-29                31
241      67000    D39 67000310 <NA> 2023-07-29                31
19       67000    D39 67000310 <NA> 2023-07-29                31
63       67000    D01 67000320 <NA> 2023-07-29                31
32       67000    D01 67000320 <NA> 2023-07-29                31
182      67000    D08 67000370 <NA> 2023-07-29                31
239      67000    D39 67000310 <NA> 2023-07-29                31
                     geometry
211 POINT (120.2669 23.02954)
175  POINT (120.2196 23.0086)
145 POINT (120.2223 23.02192)
169 POINT (120.2594 23.04125)
241 POINT (120.2317 23.01115)
19  POINT (120.2357 23.01308)
63  POINT (120.2291 22.99008)
32  POINT (120.2265 22.97283)
182 POINT (120.1982 22.98234)
239 POINT (120.2323 23.00397)

7.5 Visualizing the dengue fever layer within the study area

Let’s visualize the dengue cases within the study area using tmap.

tmap_mode("plot")
tm_shape(tainan_sf_confined) +
 tm_borders(lwd = 1, col = "#C09984") +
  tm_shape(dengue_2023_tainan) +
  tm_dots(col = "#2B2B2B", size = 0.05) +
  tm_layout(title = "Dengue Cases within Study Area",
            title.position = c("center", "top"),
            title.size = 0.8,
            frame = FALSE)

tmap_mode('plot')
tm_shape(tainan_sf_confined) +
  tm_polygons("TOWNID") +
  tm_shape(dengue_2023_tainan) +
  tm_dots(col = "#2B2B2B", size = 0.05) +
  tm_layout(title = "Dengue Cases within Study Area",
            title.position = c("center", "top"),
            title.size = 0.8,
            frame = FALSE,
            legend.position = c("left", "bottom"),
            legend.title.size = 0.7,
            legend.text.size = 0.6)

From the visual analysis of the maps provided, a noticeable clustering pattern is evident in the central region of the study area. On the contrary, points located in the regions away from the center appear to be more dispersed.

8. Choropleth Mapping and Analysis

In this section, we will create and visualize a choropleth map representing the occurrences of dengue fever cases specifically within the confined Tainan study layer.

Before we visualize the map, some data processing needs to be done. First, we aggregate the dengue fever cases by village code, creating new data with the summarized information. The dengue fever cases are referred to as observations in the code. The result is then converted to a standard data frame and the geometry column is removed for a more straightforward join operation later.

grouped_data <- dengue_2023_tainan %>%
  group_by(VILLCODE) %>%
  summarize(OBSERVATIONS_SUM = n())

grouped_data <- as.data.frame(grouped_data)

grouped_data <- grouped_data %>%
  select(-geometry)

Then, we merge this data with the confined Tainan study layer based on the village code. The result is a modified spatial data frame (dengue_2023_tainan) with additional information about the number of dengue fever observations for each village.

dengue_2023_tainan <- left_join(tainan_sf_confined, grouped_data, by = "VILLCODE")

#removed as column is irrelevant
dengue_2023_tainan <- dengue_2023_tainan %>%
  select(-NOTE)

Let’s take a look at the result:

glimpse(dengue_2023_tainan)
Rows: 258
Columns: 11
$ VILLCODE         <chr> "67000350032", "67000270011", "67000370005", "6700033…
$ COUNTYNAME       <chr> "臺南市", "臺南市", "臺南市", "臺南市", "臺南市", "臺…
$ TOWNNAME         <chr> "安南區", "仁德區", "中西區", "南區", "安南區", "安南…
$ VILLNAME         <chr> "青草里", "保安里", "赤嵌里", "大成里", "城北里", "城…
$ VILLENG          <chr> "Qingcao Vil.", "Bao'an Vil.", "Chihkan Vil.", "Dache…
$ COUNTYID         <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D"…
$ COUNTYCODE       <chr> "67000", "67000", "67000", "67000", "67000", "67000",…
$ TOWNID           <chr> "D06", "D32", "D08", "D02", "D06", "D06", "D08", "D06…
$ TOWNCODE         <chr> "67000350", "67000270", "67000370", "67000330", "6700…
$ OBSERVATIONS_SUM <int> 2, 19, 111, 29, 1, 10, 38, 44, 112, 65, 28, 2, 3, 11,…
$ geometry         <POLYGON [°]> POLYGON ((120.1176 23.08387..., POLYGON ((120…

Now, we are ready to create our choropleth map! This map visualizes the distribution of dengue cases per village for epidemiology weeks 31 to 50 for the year 2023.

tmap_mode("plot")
tm_shape(dengue_2023_tainan) +
  tm_fill("OBSERVATIONS_SUM", 
          style = "quantile", 
          palette = "Purples",
          title = "Observations") +
  tm_layout(main.title = "Distribution of Dengue Cases per Village",
            main.title.position = "center",
            main.title.size = 0.8,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2)

Similar to the previous observation in the tmap, we can discern a pattern where the outer regions of the confined map exhibit a lower number of recorded cases. As we move towards the central regions of the confined map, there is a noticeable escalation in the number of cases, with varying intensities, denoted by the darker gradients surrounding the center. In the outer regions, villages with low dengue fever case records are predominantly surrounded by other villages with similarly low cases. As we move towards the center, we observe instances where villages with high case numbers are surrounded by villages exhibiting both high and low case counts. To gain deeper insights, let’s conduct additional analyses.

9. Global Spatial Autocorrelation

Global spatial autocorrelation is a statistical measure that quantifies the degree to which similar values of a variable are clustered or dispersed across spatial units (e.g., regions, points, polygons). In simpler terms, it helps determine if neighboring locations tend to have similar or dissimilar values. The most common metric for global spatial autocorrelation is the Moran’s I statistic.

9.1 Deriving contiguity with Queen’s method

Before proceeding with global spatial autocorrelation analysis, it is crucial to establish spatial contiguity, or in other words, neighborhood relationships among the spatial units in your study area. In this exercise, we will utilize the Queen’s contiguity method. The Queen’s contiguity method defines spatial contiguity by considering neighboring units that share a common border or vertex.

To ensure the validity of the analysis, the first step involves handling missing values. In this case, any missing data (NA) in the data frame will be dropped.

dengue_2023_tainan <- tidyr::drop_na(dengue_2023_tainan)

Then, we derive contiguity:

wm_q <- dengue_2023_tainan %>%
  mutate(nb = st_contiguity(geometry),
         wt = st_weights(nb,
                         style = "W"),
         .before = 1) 
wm_q
Simple feature collection with 257 features and 12 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS:  TWD97
First 10 features:
                                             nb
1                                   6, 118, 160
2                       126, 128, 138, 167, 221
3          68, 69, 171, 180, 183, 184, 187, 199
4                    94, 97, 100, 104, 181, 206
5                              12, 13, 248, 254
6                      1, 12, 13, 118, 160, 248
7                               54, 98, 99, 200
8  9, 73, 75, 115, 125, 144, 156, 157, 165, 185
9                         8, 110, 115, 125, 165
10                  11, 159, 161, 165, 235, 257
                                                                 wt    VILLCODE
1                                   0.3333333, 0.3333333, 0.3333333 67000350032
2                                           0.2, 0.2, 0.2, 0.2, 0.2 67000270011
3            0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125 67000370005
4  0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667 67000330004
5                                            0.25, 0.25, 0.25, 0.25 67000350028
6  0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667 67000350030
7                                            0.25, 0.25, 0.25, 0.25 67000370009
8                  0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 67000350017
9                                           0.2, 0.2, 0.2, 0.2, 0.2 67000350049
10 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667, 0.1666667 67000350018
   COUNTYNAME TOWNNAME VILLNAME       VILLENG COUNTYID COUNTYCODE TOWNID
1      臺南市   安南區   青草里  Qingcao Vil.        D      67000    D06
2      臺南市   仁德區   保安里   Bao'an Vil.        D      67000    D32
3      臺南市   中西區   赤嵌里  Chihkan Vil.        D      67000    D08
4      臺南市     南區   大成里  Dacheng Vil.        D      67000    D02
5      臺南市   安南區   城北里 Chengbei Vil.        D      67000    D06
6      臺南市   安南區   城南里 Chengnan Vil.        D      67000    D06
7      臺南市   中西區   法華里    Fahua Vil.        D      67000    D08
8      臺南市   安南區   海南里   Hainan Vil.        D      67000    D06
9      臺南市   安南區   國安里   Guo'an Vil.        D      67000    D06
10     臺南市   安南區   溪心里    Xixin Vil.        D      67000    D06
   TOWNCODE OBSERVATIONS_SUM                       geometry
1  67000350                2 POLYGON ((120.1176 23.08387...
2  67000270               19 POLYGON ((120.2304 22.93544...
3  67000370              111 POLYGON ((120.2012 22.99966...
4  67000330               29 POLYGON ((120.1985 22.98147...
5  67000350                1 POLYGON ((120.1292 23.06512...
6  67000350               10 POLYGON ((120.1246 23.06904...
7  67000370               38 POLYGON ((120.2094 22.98452...
8  67000350               44 POLYGON ((120.175 23.02218,...
9  67000350              112 POLYGON ((120.1866 23.02766...
10 67000350               65 POLYGON ((120.1834 23.06086...

9.2 Performing global Moran’s I test

In the following code snippet, the global_moran() function is used to calculate the Moran’s I value. In contrast to the spdep package, the result, using the sfdep package, is presented as a tibble data.frame.

The hypotheses are as follows:

  • H0: There is no spatial autocorrelation in the data, and any observed spatial pattern is a result of random chance.

  • H1: There is a significant positive spatial autocorrelation in the data, indicating that similar values are more likely to be close to each other than would be expected by random chance.

  • Significance level: 0.05

global_moran_test(wm_q$OBSERVATIONS_SUM,
                       wm_q$nb,
                       wm_q$wt)

    Moran I test under randomisation

data:  x  
weights: listw    

Moran I statistic standard deviate = 12.703, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.464345173      -0.003906250       0.001358755 

In the output, the p-value is reported as < 2.2e-16, which is essentially zero. Given the extremely low p-value which is lower than the alpha value of 0.05, we reject the null hypothesis which means that there is a significant positive spatial autocorrelation in the data. In addition, the positive Moran’s I statistic value of 0.464 indicate spatial clustering, suggesting that similar values are located near each other. In other words, there is a tendency for villages with similar levels of dengue cases to be located near each other on the map.

9.3 Performing global Moran’I permutation test

To enhance the reliability of our results, we conduct the global Moran’s I permutation test. Prior to running the simulation, we use set.seed() to ensure the reproducibility of the computations.

set.seed(1234)
global_moran_perm(wm_q$OBSERVATIONS_SUM,
                       wm_q$nb,
                       wm_q$wt,
                  nsim = 99)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.46435, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

In a Monte Carlo simulation of Moran’s I with 100 simulations, the observed Moran’s I statistic is 0.46435, indicating clustering of data points. The p-value is reported as < 2.2e-16 which is smaller than the alpha value of 0.05, indicating statistical significance. The alternative hypothesis is two-sided, suggesting that the observed spatial pattern is significantly different from what would be expected under spatial randomness. Therefore, we reject the null hypothesis of no spatial autocorrelation, confirming the presence of a significant spatial pattern in the data.

10. Local Spatial Autocorrelation

Local Spatial Autocorrelation, often measured using indices like Local Moran’s I or Getis-Ord Gi*, assesses the spatial clustering patterns at a local level within a study area. Unlike global spatial autocorrelation, which provides an overall measure of spatial pattern across the entire area, local spatial autocorrelation focuses on identifying clusters, outliers, and areas of localized similarity or dissimilarity.

Local Moran’s I

Local Moran’s I calculates the correlation between the value of a variable at a specific location and the average value of that variable in its neighboring locations.

10.1 Computing local Moran’s I

lisa <- wm_q %>% 
  mutate(local_moran = local_moran(
    OBSERVATIONS_SUM, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_moran)

Let’s take a look at the content of the result.

lisa
Simple feature collection with 257 features and 24 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS:  TWD97
# A tibble: 257 × 25
        ii      eii  var_ii   z_ii    p_ii p_ii_sim p_folded_sim skewness
     <dbl>    <dbl>   <dbl>  <dbl>   <dbl>    <dbl>        <dbl>    <dbl>
 1  1.40    0.0271  0.511    1.92  0.0546      0.02         0.01   -0.473
 2  0.756   0.0142  0.151    1.91  0.0563      0.06         0.03   -0.764
 3  0.0324 -0.0306  0.0558   0.267 0.790       0.72         0.36    0.797
 4 -0.215  -0.0200  0.115   -0.576 0.565       0.56         0.28   -0.447
 5  1.43   -0.0628  0.386    2.40  0.0163      0.02         0.01   -0.899
 6  1.29    0.0420  0.185    2.91  0.00365     0.02         0.01   -0.325
 7  0.326   0.0249  0.0917   0.995 0.320       0.34         0.17   -1.14 
 8  0.0212 -0.0240  0.0253   0.284 0.776       0.88         0.44   -0.365
 9  0.402   0.0515  0.108    1.07  0.286       0.36         0.18    0.140
10  0.0871 -0.00205 0.00315  1.59  0.112       0.08         0.04   -0.435
# ℹ 247 more rows
# ℹ 17 more variables: kurtosis <dbl>, mean <fct>, median <fct>, pysal <fct>,
#   nb <nb>, wt <list>, VILLCODE <chr>, COUNTYNAME <chr>, TOWNNAME <chr>,
#   VILLNAME <chr>, VILLENG <chr>, COUNTYID <chr>, COUNTYCODE <chr>,
#   TOWNID <chr>, TOWNCODE <chr>, OBSERVATIONS_SUM <int>,
#   geometry <POLYGON [°]>

The output of local_moran() is a sf data.frame containing the columns ii, eii, var_ii, z_ii, p_ii, p_ii_sim, and p_folded_sim.

Note

ii: Observed Local Moran’s I statistic.

eii: Expected value under spatial randomness.

var_ii: Variance of Local Moran’s I.

z_ii: Standard deviate of observed value.

p_ii: Two-tailed p-value for observed value.

p_ii_sim: Simulated two-tailed p-value.

p_folded_sim: Folded simulated p-value, useful for asymmetrical distributions.

10.2 Visualising local Moran’s I and p-value of local Moran’s I

Here is a choropleth map depicting the observed local Moran’s I values.

tmap_mode("plot")
tm_shape(lisa) +
  tm_fill("ii",  palette = "RdBu") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Frequency of Dengue Cases",
            main.title.size = 0.8)

The choropleth shows there is evidence for both positive and negative ii values. However, it is useful to consider the p-values for each of these ii values. Here is a choropleth map depicting the p-value of local Moran’s I.

tmap_mode("plot")
tm_shape(lisa) +
  tm_fill("p_ii_sim", palette = "RdBu") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
   tm_layout(main.title = "p-value of Local Moran's I",
            main.title.size = 0.8)

Note

For p-values, the appropriate classification should be 0.001, 0.01, 0.05 and not significant instead of the default classification. This will be corrected in the comparison below.

For effective comparison, we plot the maps next to one another. The tmap with distribution based on town ID is also included for analysis.

tmap_mode("plot")
map1 <- tm_shape(lisa) +
  tm_fill("ii", palette = "RdBu") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Local Moran's I of Frequency of Dengue Cases",
            main.title.size = 0.8)

map2 <- tm_shape(lisa) +
  tm_fill("p_ii",  palette = "RdBu",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "p-value of Local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1, map2, ncol = 2)

tmap_mode('plot')
tm_shape(tainan_sf_confined) +
  tm_polygons("TOWNID")

Upon comparing the two maps, it becomes evident that numerous villages exhibit positive Local Moran’s I values, indicative of clustering. However, the accompanying non-significant p-values of these same villages imply that this observed clustering may be purely coincidental, lacking statistical support for a systematic spatial pattern.

Furthermore, it can be noted that places exhibiting higher Local Moran’s I values also coincide with those having significant p-values. The combination of elevated Local Moran’s I values and significant p-values further strengthens the indication that these areas possess distinct spatial characteristics contributing to non-random clustering. These villages are located in towns D01, D02, D04, D06, D32 and D39.

10.3 Visualizing LISA map

The LISA map is a categorical representation illustrating outliers and clusters within a geographic area. Outliers come in two types: High-Low (HL) and Low-High (LH), each indicating areas with values significantly differing from their neighbors. Similarly, clusters are categorized as High-High (HH) and Low-Low (LL), representing spatial concentrations of similar values.

lisa_sig <- lisa  %>%
  filter(p_ii < 0.05)
tmap_mode("view")
tm_shape(lisa) +
  tm_polygons(id="VILLENG") +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig) +
  tm_fill("mean", id="VILLENG") + 
  tm_borders(alpha = 0.4)
tmap_mode('plot')
tm_shape(tainan_sf_confined) +
  tm_polygons("TOWNID")

The LISA map highlights distinct spatial clusters across the study area. In the outer regions, towns such as D06, D32, and D39 stand out with prevalent Low-Low clusters, indicating areas consistently characterized by lower dengue cases surrounded by similar low cases. This spatial pattern aligns seamlessly with our earlier observations from the choropleth map analysis of dengue cases per village in Section 8. Moving towards the central regions, a notable prevalence of High-High clusters becomes apparent. These clusters, observed in towns D01, D02, and the inner areas of towns D06 and D39, signify localized areas consistently exhibiting higher dengue cases surrounded by neighboring areas with high cases. Only a few villages fall into the low-high category, indicating limited instances of outliers with higher cases surrounded by lower cases. By clicking on individual villages, we are able to reveal their respective names and know whether they are clusters/outliers.

Hot Spot and Cold Spot Area Analysis

Hot Spot and Cold Spot Area Analysis employs spatial weights to pinpoint statistically significant hot spots and cold spots in a spatially weighted attribute. The method identifies locations with values that exhibit spatial clustering or dispersion in proximity to each other, determined by a calculated distance.

10.4 Derive spatial weight matrix

Local Gi* are distance-based spatial statistics. Hence, distance methods instead of contiguity methods should be used to derive the spatial weight matrix.

wm_idw <- dengue_2023_tainan %>%
  mutate(nb = st_contiguity(geometry),
         wts = st_inverse_distance(nb, geometry,
                                   scale = 1,
                                   alpha = 1),
         .before = 1)

10.5 Computing local Gi* statistics

Using the new spatial weight matrix, we compute the local Gi* statistics.

HCSA <- wm_idw %>% 
  mutate(local_Gi = local_gstar_perm(
    OBSERVATIONS_SUM, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_Gi)
HCSA
Simple feature collection with 257 features and 20 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS:  TWD97
# A tibble: 257 × 21
   gi_star    e_gi   var_gi p_value   p_sim p_folded_sim skewness kurtosis nb   
     <dbl>   <dbl>    <dbl>   <dbl>   <dbl>        <dbl>    <dbl>    <dbl> <nb> 
 1  -2.35  0.00295  1.96e-6 -1.93   0.0539          0.02     0.01   0.419  <int>
 2  -2.06  0.00334  1.39e-6 -1.73   0.0829          0.04     0.02   1.08   <int>
 3   0.354 0.00424  9.50e-7  0.0182 0.986           0.98     0.49   0.320  <int>
 4   0.365 0.00362  1.38e-6  0.591  0.555           0.58     0.29   0.489  <int>
 5  -2.65  0.00292  1.00e-6 -2.68   0.00744         0.02     0.01   0.877  <int>
 6  -3.16  0.00351  1.18e-6 -3.03   0.00244         0.02     0.01  -0.0124 <int>
 7  -1.25  0.00341  1.61e-6 -0.983  0.325           0.32     0.16   0.655  <int>
 8  -0.284 0.00386  8.09e-7 -0.251  0.802           0.88     0.44   0.433  <int>
 9   1.53  0.00437  1.43e-6  1.20   0.229           0.32     0.16   0.503  <int>
10  -1.47  0.00398  1.31e-6 -1.57   0.117           0.08     0.04   0.391  <int>
# ℹ 247 more rows
# ℹ 12 more variables: wts <list>, VILLCODE <chr>, COUNTYNAME <chr>,
#   TOWNNAME <chr>, VILLNAME <chr>, VILLENG <chr>, COUNTYID <chr>,
#   COUNTYCODE <chr>, TOWNID <chr>, TOWNCODE <chr>, OBSERVATIONS_SUM <int>,
#   geometry <POLYGON [°]>

10.6 Visualizing Gi* and p-value of HCSA

Here is a choropleth map depicting the Gi* values.

tmap_mode("plot")
tm_shape(HCSA) +
  tm_fill("gi_star", palette = "RdBu") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Gi* of Frequency of Dengue Cases",
            main.title.size = 0.8,
            main.title.position = "center")

Below is the choropleth map depicting the p-value of Gi* in these villages.

tmap_mode("plot")
tm_shape(HCSA) +
  tm_fill("p_sim", palette = "RdBu") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "p-value of Gi*",
            main.title.size = 0.8,
            main.title.position = "center")

For effective comparison, we plot the maps next to one another.

tmap_mode("plot")
map1 <- tm_shape(HCSA) +
  tm_fill("gi_star", palette = "RdBu") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "Gi* of Frequency of Dengue Cases",
            main.title.size = 0.8,
            main.title.position = "center")

map2 <- tm_shape(HCSA) +
  tm_fill("p_sim", palette = "RdBu",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of Gi*",
            main.title.size = 0.8,
            main.title.position = "center")

tmap_arrange(map1, map2, ncol = 2)

10.7 Visualising hot spot and cold spot areas

Now, we can visualize the statistically significant hot spot and cold spot areas by utilizing the relevant tmap functions.

HCSA_sig <- HCSA  %>%
  filter(p_sim < 0.05)
tmap_mode("view")
tm_shape(HCSA) +
  tm_polygons(id="VILLENG") +
  tm_borders(alpha = 0.5) +
tm_shape(HCSA_sig) +
  tm_fill("gi_star", id="VILLENG") + 
  tm_borders(alpha = 0.4)

Based on the confined map, regions toward the edges of the confined map display consistently low Gi* values, indicative of statistically significant cold spots for dengue fever cases. This spatial pattern suggests a lower risk or prevalence of the disease in the peripheral areas of Tainan. Conversely, positive Gi* values, representing hot spots, are concentrated in the central vicinity of the map. This clustering of hot spots implies areas with an elevated risk of dengue, warranting heightened public health interventions and monitoring efforts to effectively control the potential spread of the disease. By clicking on individual villages, we are able to reveal their respective names and know their Gi* values.

11. Emerging Hot Spot Analysis

Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method designed to find and describe the dynamic evolution of hot spot and cold spot areas over time.

11.1 Creating a time series cube

First, we make a time series cube.

dengue_2023_tainan_2 <- read_rds("../../data/rds/dengue_2023_tainan.rds")
#removed because it is irrelevant
dengue_2023_tainan_2 <- dengue_2023_tainan_2 %>%
  select(-NOTE)

For this, we will need all possible combinations of unique village codes and epidemiology weeks, allowing you to analyze and explore spatio-temporal patterns across different villages and weeks.

village_week_combinations <- expand.grid(VILLCODE = unique(tainan_sf_confined$VILLCODE), EPIDEMIOLOGY_WEEK = seq(31, 50))

Next we prepare a dataset containing information about the number of dengue fever observations for each combination of village code and epidemiology week. The code also ensures those with zero observations are accounted for in the analysis.

grouped_data_2 <- dengue_2023_tainan_2 %>%
  group_by(VILLCODE, EPIDEMIOLOGY_WEEK) %>%
  summarize(OBSERVATIONS = n())

complete_data <- left_join(village_week_combinations
, grouped_data_2, by = c("VILLCODE", "EPIDEMIOLOGY_WEEK")) %>%
  replace(is.na(.), 0)  # Replace NA values with 0

complete_data <- complete_data %>%
  select(-geometry)
complete_data <- as.tibble(complete_data)

Here we create the spacetime cube.

OBSERVATIONS_st <- spacetime(complete_data, tainan_sf_confined,
                      .loc_col = "VILLCODE",
                      .time_col = "EPIDEMIOLOGY_WEEK")

Then, we verify if the resulting object is recognized as a spacetime cube:

is_spacetime_cube(OBSERVATIONS_st)
[1] TRUE

11.2 Deriving spatial weights

Next, we derive a spatial weight matrix with inverse distance weights.

OBSERVATIONS_nb <- OBSERVATIONS_st %>%
  activate("geometry") %>%
  mutate(nb = include_self(st_contiguity(geometry)),
         wt = st_inverse_distance(nb, geometry,
                                  scale = 1,
                                  alpha = 1),
         .before = 1) %>%
  set_nbs("nb") %>%
  set_wts("wt")

Here’s a look at the resulting object.

head(OBSERVATIONS_nb)
# A tibble: 6 × 5
  VILLCODE    EPIDEMIOLOGY_WEEK OBSERVATIONS nb        wt       
  <chr>                   <int>        <dbl> <list>    <list>   
1 67000350032                31            0 <int [4]> <dbl [4]>
2 67000270011                31            1 <int [6]> <dbl [6]>
3 67000370005                31            0 <int [9]> <dbl [9]>
4 67000330004                31            0 <int [7]> <dbl [7]>
5 67000350028                31            0 <int [5]> <dbl [5]>
6 67000350030                31            0 <int [8]> <dbl [8]>

11.3 Computing Gi*

Now, we can compute the local Gi* for each village, grouped by epidemiology week.

gi_stars <- OBSERVATIONS_nb %>% 
  group_by(EPIDEMIOLOGY_WEEK) %>% 
  mutate(gi_star = local_gstar_perm(
    OBSERVATIONS, nb, wt)) %>% 
  tidyr::unnest(gi_star)
gi_stars
# A tibble: 5,160 × 13
# Groups:   EPIDEMIOLOGY_WEEK [20]
   VILLCODE   EPIDEMIOLOGY_WEEK OBSERVATIONS nb    wt    gi_star    e_gi  var_gi
   <chr>                  <int>        <dbl> <lis> <lis>   <dbl>   <dbl>   <dbl>
 1 670003500…                31            0 <int> <dbl> -0.771  0.00227 1.22e-5
 2 670002700…                31            1 <int> <dbl>  0.366  0.00385 1.04e-5
 3 670003700…                31            0 <int> <dbl> -0.382  0.00295 8.23e-6
 4 670003300…                31            0 <int> <dbl> -0.180  0.00324 1.41e-5
 5 670003500…                31            0 <int> <dbl> -0.846  0.00261 1.83e-5
 6 670003500…                31            0 <int> <dbl> -1.04   0.00284 7.71e-6
 7 670003700…                31            0 <int> <dbl> -0.846  0.00260 1.50e-5
 8 670003500…                31            0 <int> <dbl> -0.662  0.00318 6.44e-6
 9 670003500…                31            0 <int> <dbl>  0.0812 0.00257 1.12e-5
10 670003500…                31            0 <int> <dbl> -0.981  0.00289 1.46e-5
# ℹ 5,150 more rows
# ℹ 5 more variables: p_value <dbl>, p_sim <dbl>, p_folded_sim <dbl>,
#   skewness <dbl>, kurtosis <dbl>

12. Mann-Kendall Test

The Mann-Kendall test is a non-parametric statistical test used to assess the presence of a trend in a time series or spatial data. We will perform the Mann-Kendall test to assess the trend in dengue cases for each village and determine whether these trends are statistically significant. We will be conducting this test on the top 3 villages with the most number of dengue cases.

The hypotheses are as follows:

  • H0: There is no trend in the data over time.

  • H1: There is the presence of a trend, either increasing or decreasing.

  • Significance value: 0.05

The code below assesses the trend for Fuhua Village, Anfu Village and Xiqi Village.

fuhua <- gi_stars %>% 
  ungroup() %>% 
  filter(VILLCODE == "67000310037") |> 
  select(VILLCODE, EPIDEMIOLOGY_WEEK, gi_star)
anfu <- gi_stars %>% 
  ungroup() %>% 
  filter(VILLCODE == "67000350050") |> 
  select(VILLCODE, EPIDEMIOLOGY_WEEK, gi_star)
xiqi <- gi_stars %>% 
  ungroup() %>% 
  filter(VILLCODE == "67000350040") |> 
  select(VILLCODE, EPIDEMIOLOGY_WEEK, gi_star)

Next, we plot the result using ggplot2 functions.

f <- ggplot(data = fuhua, aes(x = EPIDEMIOLOGY_WEEK, y = gi_star)) +
  geom_line(color = "black", size = 1, alpha = 0.8) +
  geom_point(color = "blue", size = 1.5) +  # Add points for emphasis
  labs(title = "Trend of Dengue Cases in Fuhua Village",
       x = "Epidemiology Week",
       y = "Local Gi* Value") +
  theme_minimal() +
  theme(plot.title = element_text(size = 8),
        axis.title = element_text(size = 8),
        axis.text = element_text(size = 8),
        plot.caption = element_text(size = 8, color = "gray"))

ggplotly(f)
a <- ggplot(data = anfu, aes(x = EPIDEMIOLOGY_WEEK, y = gi_star)) +
  geom_line(color = "black", size = 1, alpha = 0.8) +
  geom_point(color = "blue", size = 1.5) +  # Add points for emphasis
  labs(title = "Trend of Dengue Cases in Anfu Village",
       x = "Epidemiology Week",
       y = "Local Gi* Value") +
  theme_minimal() +
  theme(plot.title = element_text(size = 8),
        axis.title = element_text(size = 8),
        axis.text = element_text(size = 8),
        plot.caption = element_text(size = 8, color = "gray"))

ggplotly(a)
x <- ggplot(data = xiqi, aes(x = EPIDEMIOLOGY_WEEK, y = gi_star)) +
  geom_line(color = "black", size = 1, alpha = 0.8) +
  geom_point(color = "blue", size = 1.5) +  # Add points for emphasis
  labs(title = "Trend of Dengue Cases in Xiqi Village",
       x = "Epidemiology Week",
       y = "Local Gi* Value") +
  theme_minimal() +
  theme(plot.title = element_text(size = 8),
        axis.title = element_text(size = 8),
        axis.text = element_text(size = 8),
        plot.caption = element_text(size = 8, color = "gray"))

ggplotly(x)

Now, we take a look at sl value.

fuhua %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)
# A tibble: 1 × 5
     tau    sl     S     D  varS
   <dbl> <dbl> <dbl> <dbl> <dbl>
1 -0.232 0.163   -44  190.   950

The sl value of 0.1629844 is positive and implies a upward trend in the gi_star variable for the given time period. However, this trend is not statistically significant at the chosen significance level of 0.05.

anfu %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)
# A tibble: 1 × 5
     tau       sl     S     D  varS
   <dbl>    <dbl> <dbl> <dbl> <dbl>
1 -0.589 0.000317  -112  190.   950

The sl value of 0.0003166109 is positive and indicates a very slight upward trend in the gi_star variable for the given time period. This trend is statistically significant at the chosen significance level of 0.05.

xiqi %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>% 
  tidyr::unnest_wider(mk)
# A tibble: 1 × 5
     tau        sl     S     D  varS
   <dbl>     <dbl> <dbl> <dbl> <dbl>
1 -0.642 0.0000865  -122  190.   950

The sl value of 0.00008645681 is positive and indicates a very slight upward trend in the gi_star variable for the given time period. This trend is statistically significant at the chosen significance level of 0.05.

We can replicate this process for each village by using the group_by() function of dplyr package.

all_vill <- gi_stars %>%
  group_by(VILLCODE) %>%
  summarise(mk = list(
    unclass(
      Kendall::MannKendall(gi_star)))) %>%
  tidyr::unnest_wider(mk)
mk_all <- all_vill %>% 
  arrange(sl, abs(tau)) %>% 
  slice(1:5)
mk_all
# A tibble: 5 × 6
  VILLCODE       tau        sl     S     D  varS
  <chr>        <dbl>     <dbl> <dbl> <dbl> <dbl>
1 67000310001 -0.716 0.0000119  -136  190.   950
2 67000320040 -0.705 0.0000160  -134  190.   950
3 67000310025 -0.684 0.0000285  -130  190.   950
4 67000320004 -0.653 0.0000659  -124  190.   950
5 67000350040 -0.642 0.0000865  -122  190.   950

13. Emerging Hotspot Analysis

13.1 Performing emerging hotspot analysis

Emerging Hot Spot Analysis (EHSA) is a spatio-temporal analysis method used to identify and describe areas where significant clustering or hot spots are changing over time.

ehsa <- emerging_hotspot_analysis(
  x = OBSERVATIONS_st, 
  .var = "OBSERVATIONS", 
  k = 1, 
  nsim = 99
)

13.2 Visualising the distribution of EHSA classes

Next, we use ggplot2 functions to reveal the distribution of EHSA classes in the form of a bar chart.

ggplot(data = ehsa,
       aes(x = classification)) +
  geom_bar()

Based on the bar chart, the highest bar is oscillating hot spots followed by oscillating cold spots as second highest.

13.3 Visualising EHSA

tainan_ehsa <- tainan_sf_confined %>%
  left_join(ehsa,
            by = join_by(VILLCODE == location))
ehsa_sig <- tainan_ehsa  %>%
  filter(p_value < 0.05)
tmap_mode("view")
tm_shape(tainan_ehsa) +
  tm_polygons(id="VILLENG") +
  tm_borders(alpha = 0.5) +
tm_shape(ehsa_sig) +
  tm_fill("classification", id="VILLENG") + 
  tm_borders(alpha = 0.4)

The majority of the identified patterns exhibit oscillating hotspots. These statistically significant hotspots, predominantly situated in the northern part of the confined map (towns D06 and D39), were observed during the final weeks of the dengue outbreak. Notably, these areas have a historical context of being statistically significant cold spots during prior weeks. Conversely, oscillating cold spots are more prominent in the southern regions of the confined map (towns D02 and D32) and exhibit the opposite characteristics. These areas were statistically significant hot spots in the earlier weeks and transformed into cold spots in the later weeks of the dengue outbreak. By clicking on individual villages, we are able to reveal their respective names and know the classifiction of specific villages.

14. References

  • Learning from senior: AILYS TEE XYNYN and JENNIFER POERNOMO.

  • In-class exercise 5

  • ChatGPT